The goal of this exerciser is to solidify and practically apply data processing skills discussed in the class. We attempt to cover some components of the course in this exercise.
Your task is to (at least) reproduce results of econometric analysis shown below based on the pre-defined data, sets of requirements and only R Programming skills (without manual copy-paste) in R Markdown.
One of the objective that we try to achieve by using R, RStudio, Rmarkdown and other complementary tools for executing this project is to trin the best practices and key concepts of Reproducible research with R and RStudio1 and literate programming.
Be accurate! As econometric analysis is not only about getting numbers right but also about communicating results clearly, this project scrutinizes the data manipulation skills to the limit (possibly). It does not mean that one need to specify all 16 digits after comma2 in tables. However, this means that you need to cherry pick what you display and what you don’t, how you label plots’ axes and where you leave them unnamed, what explanation you provide in the tables footnotes and how you call different columns.
Do not be deterred… The key point of this exercise (also reflected in the bulk of the total grade for this work) is to reproduce actual results, while the communication details account for about one fifth of the final grade. However, learning R often happens when one attempts to clarify details and makes the story straight and clear for the reader.
You are welcome to improve when preparing your homework! Please, do demonstrate what you have learned in data analysis and econometrics, what packages you’ve discovered, mastered and used, what improvements you’ve included and developed. You are not limited by the templates presented below. Any innovation may improve your grade3.
Basic rules:
Zero plagiarism tolerance. Although, reproducing same results may imply developing similar R code, there is still great variation between authors. This variation is not only about the naming conventions, but also about syntax, style, functional choices. Same operations could be made in dosens of ways in R. Make sure that you work on this individually and do not share your exact coding solutions. If you believe that many peers may have similar answers, add short description on where you’ve learned this answer from. Acknowledge, whom you cooperated in order to learn. Be creative! Please also note that in case of a detected plagiarism, examination office will be informed about a cheating attempt and respective university rules of examination will be applied in such case.
Deadline. Three weeks after the examination period. Exact due date for this homework is on Ilias.
Grading. This homework constitutes 40% of the final grade.
Submission is on Ilias. You shell submit
.Rmd file with you solution and rendered HTML
or pdf document.
Specify your name, surname and matriculation
number in the YAML header of your project.
Nearly reproduce “Table 1 - Descriptive Statistics” from (Acemoglu et
al. 2001) using data set Acemoglu2001.csv. Unfortunately,
as mentioned by the author, it is not possible to reproduce exact the
same numbers as they are in the paper. However, using the data set
provided with this exercise, one can reproduce exact same data tables
and plots as below.
Before, reproducing data from the table, make sure that you do follow all data cleaning steps:
## Write code for data loading here
Acemoglu2001 <- read_csv("data/Acemoglu2001.csv")
## Rows: 164 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): iso
## dbl (9): base_sample, africa, asia, other, pgp95, hjypl, avexpr, extmort4, l...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#
#
In the data set, there are following variables:
iso - Country code;
base_sample - dummy variable indicating countries
that constitute a base sample;
africa, asia and other -
dummy variables indicating if a country is in Africa, Asia or on another
continent. Please note that when all three dummy variables are equal to
zero, that means that the country is in Latin America.
pgp95 - GDP per capita (PPP) in 1995;
hjypl - Log output per worker in 1988;
avexpr - Average protection against expropriation
risk, 1985–1995;
extmort4 - European settler mortality;
lat_abst - absolute value of the latitude, where the
country is located;
Produce a table with descriptive statistics (mean and standard deviation) for all continuous variables present in the sample and their logarithm transformations computed at the stage of data cleaning (continuous variables are transformed with a natural logarithm). Compute this summary statistics for entire sample of 164 countries as well as for a sub-sample of the base countries.
dta_clean <- read_csv("data/Acemoglu2001.csv")
## Rows: 164 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): iso
## dbl (9): base_sample, africa, asia, other, pgp95, hjypl, avexpr, extmort4, l...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Develop R code here...
dta_clean<-slice(Acemoglu2001,-164)
#Replacing NANs
cleaned_dta<-dta_clean%>%
mutate(pgp95=replace(pgp95,is.na(pgp95),mean(pgp95,na.rm=TRUE)),
hjypl=replace(hjypl,is.na(hjypl),mean(hjypl,na.rm=TRUE)),
avexpr=replace(avexpr,is.na(avexpr),mean(avexpr,na.rm=TRUE)),
extmort4=replace(extmort4,is.na(extmort4),mean(extmort4,na.rm=TRUE)))
cleaned_log<-cleaned_dta%>%
mutate(pgp95=log(pgp95),
hjypl=log(hjypl),
extmort4=log(extmort4))
glimpse(cleaned_log)
## Rows: 163
## Columns: 10
## $ iso <chr> "AFG", "AGO", "ARE", "ARG", "ARM", "AUS", "AUT", "AZE", "B…
## $ base_sample <dbl> 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1…
## $ africa <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0…
## $ asia <dbl> 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0…
## $ other <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ pgp95 <dbl> 8.862889, 7.770645, 9.804219, 9.133459, 7.682482, 9.897972…
## $ hjypl <dbl> -1.22811001, -3.41124773, -1.22811001, -0.87227380, -1.228…
## $ avexpr <dbl> 7.066491, 5.363636, 7.181818, 6.386364, 7.066491, 9.318182…
## $ extmort4 <dbl> 4.540098, 5.634790, 5.397830, 4.232656, 5.397830, 2.145931…
## $ lat_abst <dbl> 0.36666667, 0.13666667, 0.26666668, 0.37777779, 0.44444445…
#Mean, Log and standard deviation
table_means<-colMeans(cleaned_dta[ ,c("pgp95","hjypl","avexpr","extmort4")])
table_means
## pgp95 hjypl avexpr extmort4
## 7064.8658759 0.2928455 7.0664914 220.9264365
#Descriptive statistics for full sample
sumtable(cleaned_dta, summ=c("mean(pgp95)","sd(pgp95)"))
| Variable | Mean(pgp95) | Sd(pgp95) |
|---|---|---|
| base_sample | NA | NA |
| africa | NA | NA |
| asia | NA | NA |
| other | NA | NA |
| pgp95 | NA | NA |
| hjypl | NA | NA |
| avexpr | NA | NA |
| extmort4 | NA | NA |
| lat_abst | NA | NA |
sumtable(cleaned_dta)
| Variable | N | Mean | Std. Dev. | Min | Pctl. 25 | Pctl. 75 | Max |
|---|---|---|---|---|---|---|---|
| base_sample | 163 | 0.393 | 0.49 | 0 | 0 | 1 | 1 |
| africa | 163 | 0.307 | 0.463 | 0 | 0 | 1 | 1 |
| asia | 163 | 0.258 | 0.439 | 0 | 0 | 1 | 1 |
| other | 163 | 0.025 | 0.155 | 0 | 0 | 0 | 1 |
| pgp95 | 163 | 7064.866 | 6984.291 | 450 | 1665 | 8624.998 | 29399.992 |
| hjypl | 163 | 0.293 | 0.234 | 0.029 | 0.108 | 0.323 | 1 |
| avexpr | 163 | 7.066 | 1.553 | 1.636 | 6.375 | 7.727 | 10 |
| extmort4 | 163 | 220.926 | 299.819 | 2.55 | 81.6 | 220.926 | 2940 |
| lat_abst | 162 | 0.296 | 0.19 | 0 | 0.144 | 0.447 | 0.722 |
glimpse(sumtable)
## function (data, vars = NA, out = NA, file = NA, summ = NA, summ.names = NA,
## add.median = FALSE, group = NA, group.long = FALSE, group.test = FALSE,
## group.weights = NA, col.breaks = NA, digits = NA, fixed.digits = FALSE,
## factor.percent = TRUE, factor.counts = TRUE, factor.numeric = FALSE,
## logical.numeric = FALSE, logical.labels = c("No", "Yes"), labels = NA,
## title = "Summary Statistics", note = NA, anchor = NA, col.width = NA,
## col.align = NA, align = NA, note.align = "l", fit.page = "\\textwidth",
## simple.kable = FALSE, opts = list())
summarise(cleaned_dta,obs=n(), sd_pgp95=sd(pgp95, na.rm = TRUE),
sd_hjypl =sd(hjypl, na.rm = TRUE),
sd_avexpr= sd(avexpr, na.rm = TRUE),
sd_extmort4=sd(extmort4, na.rm = TRUE),
sd_lat_abst = sd(lat_abst, na.rm = TRUE))
glimpse(cleaned_dta)
## Rows: 163
## Columns: 10
## $ iso <chr> "AFG", "AGO", "ARE", "ARG", "ARM", "AUS", "AUT", "AZE", "B…
## $ base_sample <dbl> 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1…
## $ africa <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0…
## $ asia <dbl> 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0…
## $ other <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ pgp95 <dbl> 7064.8659, 2369.9998, 18109.9945, 9259.9978, 2169.9996, 19…
## $ hjypl <dbl> 0.2928455, 0.0330000, 0.2928455, 0.4180000, 0.2928455, 0.8…
## $ avexpr <dbl> 7.066491, 5.363636, 7.181818, 6.386364, 7.066491, 9.318182…
## $ extmort4 <dbl> 93.7000, 280.0000, 220.9264, 68.9000, 220.9264, 8.5500, 22…
## $ lat_abst <dbl> 0.36666667, 0.13666667, 0.26666668, 0.37777779, 0.44444445…
#
base_sample_stat<-filter(cleaned_dta, base_sample ==1)
base_sample1<-base_sample_stat[, 6:10]
sumtable(base_sample1)
| Variable | N | Mean | Std. Dev. | Min | Pctl. 25 | Pctl. 75 | Max |
|---|---|---|---|---|---|---|---|
| pgp95 | 64 | 5445.458 | 6327.345 | 450 | 1480 | 6967.501 | 27329.998 |
| hjypl | 64 | 0.23 | 0.223 | 0.029 | 0.066 | 0.293 | 1 |
| avexpr | 64 | 6.516 | 1.469 | 3.5 | 5.614 | 7.352 | 10 |
| extmort4 | 64 | 245.911 | 472.624 | 8.55 | 68.9 | 240 | 2940 |
| lat_abst | 64 | 0.181 | 0.133 | 0 | 0.089 | 0.258 | 0.667 |
glimpse(sumtable)
## function (data, vars = NA, out = NA, file = NA, summ = NA, summ.names = NA,
## add.median = FALSE, group = NA, group.long = FALSE, group.test = FALSE,
## group.weights = NA, col.breaks = NA, digits = NA, fixed.digits = FALSE,
## factor.percent = TRUE, factor.counts = TRUE, factor.numeric = FALSE,
## logical.numeric = FALSE, logical.labels = c("No", "Yes"), labels = NA,
## title = "Summary Statistics", note = NA, anchor = NA, col.width = NA,
## col.align = NA, align = NA, note.align = "l", fit.page = "\\textwidth",
## simple.kable = FALSE, opts = list())
#Descriptive statistics after log transformation
base_sample_statis<-filter(cleaned_log, base_sample==0)
base_sample2<-base_sample_statis[, ! names(cleaned_log)%in% c("base_sample","africa","asia","other")]
sumtable(base_sample2)
| Variable | N | Mean | Std. Dev. | Min | Pctl. 25 | Pctl. 75 | Max |
|---|---|---|---|---|---|---|---|
| pgp95 | 99 | 8.543 | 1.042 | 6.461 | 7.81 | 9.349 | 10.289 |
| hjypl | 99 | -1.418 | 0.919 | -3.54 | -1.554 | -0.892 | -0.014 |
| avexpr | 99 | 7.423 | 1.508 | 1.636 | 7.066 | 8.205 | 10 |
| extmort4 | 99 | 5.172 | 0.796 | 0.936 | 5.398 | 5.398 | 6.18 |
| lat_abst | 98 | 0.37 | 0.186 | 0.011 | 0.207 | 0.519 | 0.722 |
glimpse(sumtable(base_sample2))
## 'kableExtra' chr "<table class=\"table\" style=\"margin-left: auto; margin-right: auto;\">\n<caption>Summary Statistics</caption>"| __truncated__
## - attr(*, "format")= chr "html"
Note: numbers in the columns correspond to means, standard deviations are reported in the parentheses.
If there are difficulties with reproducing exact table, use other conventional functions for the summary statistics in order to print same summary statistics in separate tables or text output. For example:
Use the same data to calculate means of the above mentioned contentious variables by quartiles of mortality for the base sample only. Quartiles of mortality are:
for mortality less than 65.4;
greater than or equal to 65.4 and less than 78.1;
greater than or equal to 78.1 and less than 280;
greater than or equal to 280;
## Develop R code here...
#for mortality less than 65.4
Mortality<-base_sample1%>%
filter(extmort4<=65.4)
summarise(Mortality, obs=n(), mean = colMeans(Mortality, na.rm = TRUE),
log =log(mean))
#Mortality greater than or equal to 65.4 and less than 78.1
Mortality1<-filter(base_sample1, extmort4 >=65.4 & extmort4<78.1 )
summarise(Mortality1, obs=n(), mean = colMeans(Mortality1, na.rm=TRUE),
log =log(mean))
#Mortality greater than or equal to 78.1 and less than 280
Mortality2<-filter(base_sample1, extmort4 >=78.1 & extmort4<280)
summarise(Mortality2, obs=n(), mean = colMeans(Mortality2, na.rm=TRUE),
log =log(mean))
#Mortality greater than or equal to 280;
Mortality3<-filter(base_sample1, extmort4 >=280)
summarise(Mortality3, obs=n(), mean = colMeans(Mortality3, na.rm=TRUE),log =log(mean))
#
#
Develop two box plots of the GDP per capita and European settler mortality by regions (Africa, Asia, Others and Latin America) for the base sample only.
##box plot for GDP per capita
Dataframe<-select(cleaned_dta, africa:pgp95)
africa_dt<-filter(Dataframe, africa==1)
asia_dt<-filter(Dataframe, asia==1)
other_dt<-filter(Dataframe,other==1)
Latin_America_dt<-filter(Dataframe,africa==0 & asia==0 & other==0)
dt_boxplot<-rbind(africa_dt, asia_dt, other_dt, Latin_America_dt)
GDP<-mutate(dt_boxplot, region= africa +asia +other)
#Data distribution in GDP Dataset
attach(GDP)
GDP[51:92,5]<-2
GDP[93:96,5]<-4
GDP[97:163,5]<-3
New_GDP<-GDP
class(New_GDP$region)
## [1] "numeric"
head(New_GDP$region,53)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
as.factor(New_GDP$rerion)
## Warning: Unknown or uninitialised column: `rerion`.
## factor(0)
## Levels:
New_GDP$region[New_GDP$region==1]<-"Africa"
New_GDP$region[New_GDP$region==2]<-"Asia"
New_GDP$region[New_GDP$region==3]<-"Latin.America"
New_GDP$region[New_GDP$region==4]<-"Other"
GDP_per_capita_1995<-pgp95
boxplot(GDP_per_capita_1995~region, data = New_GDP,col=rainbow(4))
#box plot for mortality rate
Dataframe1<-cleaned_dta[, ! names(cleaned_dta) %in%
c("base_sample","iso","pgp95","hjypl","avexpr","lat_abst")]
africa1_dta<-filter(Dataframe1, africa==1)
asia1_dta<-filter(Dataframe1, asia==1)
other1_dta<-filter(Dataframe1,other==1)
Latin_America1_dta<-filter(Dataframe1,africa==0 & asia==0 & other==0)
Mortality_rate<-rbind(africa1_dta, asia1_dta, other1_dta, Latin_America1_dta)
New_Mortality_rate<-mutate(Mortality_rate, region= africa +asia +other)
attach(New_Mortality_rate)
## The following objects are masked from GDP:
##
## africa, asia, other, region
New_Mortality_rate[51:92,5]<-2
New_Mortality_rate[93:96,5]<-4
New_Mortality_rate[97:163,5]<-3
Plot_mortality<-New_Mortality_rate
as.factor(Plot_mortality$region)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [112] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [149] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## Levels: 1 2 3 4
Plot_mortality$region[Plot_mortality$region==1]<-"africa"
Plot_mortality$region[Plot_mortality$region==2]<-"asia"
Plot_mortality$region[Plot_mortality$region==3]<-"L.America"
Plot_mortality$region[Plot_mortality$region==4]<-"other"
European_settler_Mortality<-extmort4
boxplot(European_settler_Mortality~region, data=Plot_mortality, col=rainbow(4))
Filter two regions: Africa and Latin America and compare means of mortality rates between these two groups. Perform a t-test to compare the means.
## Develop R code here…
Data_frame<-select(cleaned_dta, africa:extmort4)
AFRICA<-filter(Data_frame, africa==1)
ASIA<-filter(Data_frame, asia==1)
OTHER<-filter(Data_frame,other==1)
L_America<-filter(Data_frame,africa==0 & asia==0 & other==0)
Continet<-rbind(AFRICA, ASIA, OTHER, L_America)
Continet1<-mutate(Continet, region= africa +asia +other)
attach(Continet1)
## The following objects are masked from New_Mortality_rate:
##
## africa, asia, extmort4, other, region
## The following objects are masked from GDP:
##
## africa, asia, other, pgp95, region
Continet1[51:92,8]<-2
Continet1[93:96,8]<-4
Continet1[97:163,8]<-3
New_Continent<-Continet1
as.factor(New_Continent$region)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [112] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [149] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## Levels: 1 2 3 4
New_Continent$region[New_Continent$region==1]<-"africa"
New_Continent$region[New_Continent$region==2]<-"asia"
New_Continent$region[New_Continent$region==3]<-"L.America"
New_Continent$region[New_Continent$region==4]<-"other"
Africa_mortality<-filter(New_Continent, region == "africa" )
L_America_mortality<-filter(New_Continent, region== "L.America")
t.test(Africa_mortality$extmort4,L_America_mortality$extmort4)
##
## Welch Two Sample t-test
##
## data: Africa_mortality$extmort4 and L_America_mortality$extmort4
## t = 2.927, df = 50.737, p-value = 0.005111
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 65.78114 353.16888
## sample estimates:
## mean of x mean of y
## 366.2865 156.8115
#
#
Explain in your own words, what are the causal implication of the observed differences? In other words, if we observe any difference between the regions, does it mean that different regions cause difference in income and mortality rates?
Write your answer here: … The outcome shows a more than 2 times higher mortality rate in Africa (366.2865) than Latin America (156.8115). These may be due to low levels of gdp in Africa compared to Latin America which positively collerate with mortality rates. Low gdp may limit growth of the health system and food sector. However, it is not true that different regions cause difference in income and mortality rates. Some mortalities are caused by epidemics such as weather, and even pandemics such as COVID 19 which cut across continents. ## Problem 5. Relationship between continuous variables (10 points)
Develop three scatter plots similar to figures 1, 2 and 3 in (Acemoglu et al. 2001) and compute corresponding correlation coefficients. Please note, correlation should not necessarily be plotted and could be printed in a table instead.
## Develop R code here...
pairs(~pgp95+avexpr+extmort4, data = cleaned_dta,
main="Simple Scatterplot Matrix")
#
#
## Develop R code here...
regress<-lm(pgp95~ avexpr + other + lat_abst+asia+ africa, data =cleaned_dta)
summary(regress)
##
## Call:
## lm(formula = pgp95 ~ avexpr + other + lat_abst + asia + africa,
## data = cleaned_dta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8723 -3600 -1036 2480 13600
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10576.8 2247.5 -4.706 5.53e-06 ***
## avexpr 2529.2 309.1 8.182 9.21e-14 ***
## other 1853.2 2591.9 0.715 0.4757
## lat_abst 3090.0 2675.3 1.155 0.2499
## asia -1133.1 1020.9 -1.110 0.2688
## africa -2817.8 1120.2 -2.516 0.0129 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5001 on 156 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.5062, Adjusted R-squared: 0.4904
## F-statistic: 31.99 on 5 and 156 DF, p-value: < 2.2e-16
#
#
## Develop R code here...
#install.packages("ivreg, dependencies =TRUE")
cleaned_dta<-cleaned_dta%>%
mutate(logpgp95=log(pgp95),
logextmort4=log(extmort4),
loghjypl=log(hjypl))
cleaned_dta<-cleaned_dta[, ! names(cleaned_dta)%in% c("pgp95","loghjypl")]
regress<-lm(logpgp95~ avexpr+ lat_abst+asia+ africa +
other+logextmort4,data=cleaned_dta)
summary(regress)
##
## Call:
## lm(formula = logpgp95 ~ avexpr + lat_abst + asia + africa + other +
## logextmort4, data = cleaned_dta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.65317 -0.50674 0.02696 0.42997 2.19663
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.63071 0.48878 15.612 < 2e-16 ***
## avexpr 0.25203 0.04770 5.283 4.24e-07 ***
## lat_abst 0.69596 0.41858 1.663 0.09840 .
## asia -0.32328 0.15588 -2.074 0.03975 *
## africa -0.75355 0.17849 -4.222 4.12e-05 ***
## other -0.16774 0.42349 -0.396 0.69258
## logextmort4 -0.19030 0.06807 -2.796 0.00583 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7611 on 155 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.511, Adjusted R-squared: 0.492
## F-statistic: 26.99 on 6 and 155 DF, p-value: < 2.2e-16
#
#
Acemoglu, Daron, Simon Johnson, and James A Robinson. 2001. “The Colonial Origins of Comparative Development: An Empirical Investigation.” American Economic Review 91 (5): 1369–1401. https: //doi.org/10.1257/aer.91.5.1369
To start with reproducible research using RMarkdown, see Chapter 26 R4DS. More comprehensive guide on reproducible research with R and its programming side is R Markdown: The Definitive Guide. Finally, the definitive guide is the book Reproducible research with R and RStudio↩︎
As you may know, computers are quite inaccurate when it
comes to computations with a floating point (non integer numbers).
Usually, just after 16 digits after comma, computer returns random
numbers. See, for example “Circle 1. Falling into the Floating Point
Trap” (pp. 9-11) in R
Inferno or read more about it in the Floating-point
arithmetic article. Or guess what R would return for this comparison
.1 == .3/3↩︎
Unfortunately, there are reasonable limitations to what extent the grade could be improved. There is a maximum 100%, which cannot be exceeded and 80% of the grade is still about reproducing statistical results from the template solution, which cannot be compensated by the improved theam for plots or tables.↩︎